9 research outputs found
Characterizing and Improving Stability in Neural Style Transfer
Recent progress in style transfer on images has focused on improving the
quality of stylized images and speed of methods. However, real-time methods are
highly unstable resulting in visible flickering when applied to videos. In this
work we characterize the instability of these methods by examining the solution
set of the style transfer objective. We show that the trace of the Gram matrix
representing style is inversely related to the stability of the method. Then,
we present a recurrent convolutional network for real-time video style transfer
which incorporates a temporal consistency loss and overcomes the instability of
prior methods. Our networks can be applied at any resolution, do not re- quire
optical flow at test time, and produce high quality, temporally consistent
stylized videos in real-time
WiForceSticker: Batteryless, Thin Sticker-like Flexible Force Sensor
Any two objects in contact with each other exert a force that could be simply
due to gravity or mechanical contact, such as a robotic arm gripping an object
or even the contact between two bones at our knee joints. The ability to
naturally measure and monitor these contact forces allows a plethora of
applications from warehouse management (detect faulty packages based on
weights) to robotics (making a robotic arms' grip as sensitive as human skin)
and healthcare (knee-implants). It is challenging to design a ubiquitous force
sensor that can be used naturally for all these applications. First, the sensor
should be small enough to fit in narrow spaces. Next, we don't want to lay
cumbersome cables to read the force values from the sensors. Finally, we need
to have a battery-free design to meet the in-vivo applications. We develop
WiForceSticker, a wireless, battery-free, sticker-like force sensor that can be
ubiquitously deployed on any surface, such as all warehouse packages, robotic
arms, and knee joints. WiForceSticker first designs a tiny
~mm~~~mm~~~mm capacitative sensor design equipped
with a ~mm~~~mm antenna designed on a flexible PCB substrate.
Secondly, it introduces a new mechanism to transduce the force information on
ambient RF radiations that can be read by a remotely located reader wirelessly
without requiring any battery or active components at the force sensor, by
interfacing the sensors with COTS RFID systems. The sensor can detect forces in
the range of -~N with sensing accuracy of ~N across multiple
testing environments and evaluated with over varying force level
presses on the sensor. We also showcase two application case studies with our
designed sensors, weighing warehouse packages and sensing forces applied by
bone joints
VIMA: General Robot Manipulation with Multimodal Prompts
Prompt-based learning has emerged as a successful paradigm in natural
language processing, where a single general-purpose language model can be
instructed to perform any task specified by input prompts. Yet task
specification in robotics comes in various forms, such as imitating one-shot
demonstrations, following language instructions, and reaching visual goals.
They are often considered different tasks and tackled by specialized models. We
show that a wide spectrum of robot manipulation tasks can be expressed with
multimodal prompts, interleaving textual and visual tokens. Accordingly, we
develop a new simulation benchmark that consists of thousands of
procedurally-generated tabletop tasks with multimodal prompts, 600K+ expert
trajectories for imitation learning, and a four-level evaluation protocol for
systematic generalization. We design a transformer-based robot agent, VIMA,
that processes these prompts and outputs motor actions autoregressively. VIMA
features a recipe that achieves strong model scalability and data efficiency.
It outperforms alternative designs in the hardest zero-shot generalization
setting by up to task success rate given the same training data.
With less training data, VIMA still performs better than
the best competing variant. Code and video demos are available at
https://vimalabs.github.io/Comment: ICML 2023 Camera-ready version. Project website:
https://vimalabs.github.io
Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks
Understanding human motion behavior is critical for autonomous moving platforms (like self-driving cars and social robots) if they are to navigate human-centric environments. This is challenging because human motion is inherently multimodal: given a history of human motion paths, there are many socially plausible ways that people could move in the future. We tackle this problem by combining tools from sequence prediction and generative adversarial networks: a recurrent sequence-to-sequence model observes motion histories and predicts future behavior, using a novel pooling mechanism to aggregate information across people. We predict socially plausible futures by training adversarially against a recurrent discriminator, and encourage diverse predictions with a novel variety loss. Through experiments on several datasets we demonstrate that our approach outperforms prior work in terms of accuracy, variety, collision avoidance, and computational complexity
RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation
The ability to leverage heterogeneous robotic experience from different
robots and tasks to quickly master novel skills and embodiments has the
potential to transform robot learning. Inspired by recent advances in
foundation models for vision and language, we propose a foundation agent for
robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned
decision transformer capable of consuming multi-embodiment action-labelled
visual experience. This data spans a large repertoire of motor control skills
from simulated and real robotic arms with varying sets of observations and
actions. With RoboCat, we demonstrate the ability to generalise to new tasks
and robots, both zero-shot as well as through adaptation using only 100--1000
examples for the target task. We also show how a trained model itself can be
used to generate data for subsequent training iterations, thus providing a
basic building block for an autonomous improvement loop. We investigate the
agent's capabilities, with large-scale evaluations both in simulation and on
three different real robot embodiments. We find that as we grow and diversify
its training data, RoboCat not only shows signs of cross-task transfer, but
also becomes more efficient at adapting to new tasks